ggplot(
data = <DATA>,
mapping = aes(<MAPPINGS>)
) +
<GEOM FUNCTION>() +
any other arguments...ggplot2)The Grammar of Graphics (GoG) is a principled way of specifying exactly how to create a particular graph from a given data set. It helps us to systematically design new graphs.
Think of a graph or a data visualization as a mapping…
…FROM variables in the data set (or statistics computed from the data)…
…TO visual attributes (or “aesthetics”) of marks (or “geometric elements”) on the page/screen.
data: dataframe containing variablesaes : aesthetic mappings (position, color, symbol, …)geom : geometric element (point, line, bar, box, …)stat : statistical variable transformation (identity, count, linear model, quantile, …)scale : scale transformation (log scale, color mapping, axes tick breaks, …)coord : Cartesian, polar, map projection, …facet : divide into subplots using a categorical variableggplot2Complete this template to build a basic graphic:
Notice, every + adds another layer to our graphic.
Also notice that I’m using named arguments to make my code easier to read.
We map variables (columns) from the data to aesthetics on the graphic using the aes() function.
What aesthetics can we set (see ggplot2 cheat sheet for more)?
We map variables (columns) from the data to aesthetics on the graphic using the aes() function.
What aesthetics can we set (see ggplot2 cheat sheet for more)?
We use a geom_XXX() function to represent data points.
one variable
geom_density()geom_dotplot()geom_histogram()geom_boxplot()two variable
geom_point()geom_line()geom_density_2d()three variable
geom_contour()geom_raster()This is not an exhaustive list – see ggplot2 cheat sheet.
To create a specific type of graphic, we will combine aesthetics and geometric objects.
facet_wrap(~ b): facets by one variable
nrow controls the number of rows the facets are output intoncol controls the number of columns the facets are output intofacet_grid(a ~ b): facet by two variables
a will be assigned to the rowsb will be assigned to the columns into both rows and columnsYou can set scales to let axis limits vary across facets:
facet_grid(y ~ x, scales = ______)
"free" – both x- and y-axis limits adjust to individual facets"free_x" – only x-axis limits adjust"free_y" – only y-axis limits adjuststatA stat transforms an existing variable into a new variable to plot.
identity leaves the data as is.count counts the number of observations.summary allows you to specify a desired transformation function.Sometimes these statistical transformations happen under the hood when we use a specific geom_XXX().
statPosition adjustments determine how to arrange geom’s that would otherwise occupy the same space.
position = "dodge": Arrange elements side by side.position = "fill": Stack elements on top of one another + normalize height.position = "stack": Stack elements on top of one another.position = "jitter": Add random noise to x & y position of each element to avoid overplotting (see geom_jitter()).ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = cyl)
) +
geom_jitter() +
labs(x = "Engine Displacement (liters)",
y = " ",
color = "Cylinders",
title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
theme_bw() +
theme(legend.position = "bottom")ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = cyl)
) +
geom_jitter() +
labs(x = "Engine Displacement (liters)",
y = " ",
color = "Cylinders",
title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
scale_y_continuous(limits = c(0, 50),
breaks = seq(from = 0, to = 50, by = 5)
)ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = cyl)
) +
geom_jitter() +
labs(x = "Engine Displacement (liters)",
y = " ",
color = "Cylinders",
title = "Cars with More Cylinders Have Larger Engine Displacement\n and Lower Fuel Efficiency") +
scale_color_gradient(low = "white", high = "green4")It is good practice to put each geom and aes on a new line.
styler package can do this for you!This puzzle activity will require knowledge of:
None of us have all these abilities. Each of us has some of these abilities.
During your collaboration, you and your partner will alternate between two roles:
Developer
Coder
Group Norms
Every group should have a ggplot2 cheatsheet!
On the Front
On the Back
The partner whose family name starts first alphabetically starts as the Developer!
Remember:
When you have completed the visualization tasks, you will work as a group to answer the five questions posed at the end of the document.
Each person will input the answers to these questions in the PA3 Canvas quiz and submit either a link or zipped version of the project.
For those recording, you can press the X (top right corner) and wait until the video finishes uploading.
Once finished, you can start work on Lab 3!